Skip to content

Fix test_vllm_npu_worker_class_resolves: tolerate version mismatch#1

Open
UsernameFull wants to merge 109 commits into
mainfrom
npu_ci
Open

Fix test_vllm_npu_worker_class_resolves: tolerate version mismatch#1
UsernameFull wants to merge 109 commits into
mainfrom
npu_ci

Conversation

@UsernameFull
Copy link
Copy Markdown
Owner

Test fix for version incompatibility between vllm_ascend and expected import path.

UsernameFull and others added 30 commits January 28, 2026 20:51
Co-Authored-By: chengengru.cgr <chengengru.cgr@taobao.com>
Co-Authored-By: fengjingxuan.fjx <fengjingxuan.fjx@alibaba-inc.com>
Co-Authored-By: ft498870 <ft498870@taobao.com>
Co-Authored-By: heyancheng.hyc <heyancheng.hyc@taobao.com>
Co-Authored-By: hongzhen.yj <hongzhen.yj@alibaba-inc.com>
Co-Authored-By: huangju.hj <huangju.hj@alibaba-inc.com>
Co-Authored-By: jiamang.wang <jiamang.wang@alibaba-inc.com>
Co-Authored-By: scott.lxy <scott.lxy@taobao.com>
Co-Authored-By: shenjingyu.sjy <shenjingyu.sjy@alibaba-inc.com>
Co-Authored-By: shenliao.sla <shenliao.sla@taobao.com>
Co-Authored-By: tianhe.lzd <tianhe.lzd@alibaba-inc.com>
Co-Authored-By: weixun.wwx <weixun.wwx@alibaba-inc.com>
Co-Authored-By: wzy496492 <wzy496492@alibaba-inc.com>
Co-Authored-By: xiongshaopan.xsp <xiongshaopan.xsp@alibaba-inc.com>
Co-Authored-By: xuehuanran.xhr <xuehuanran.xhr@alibaba-inc.com>
Co-Authored-By: zhaohaizhou.zhz <zhaohaizhou.zhz@alibaba-inc.com>
Co-Authored-By: bzd02333762 <bzd02333762@alibaba-inc.com>
Co-authored-by: beiyue.lj <beiyue.lj@alibaba-inc.com>
Co-Authored-By: lt511297 lt511297@alibaba-inc.com
Co-Authored-By: lt511297 <lt511297@alibaba-inc.com>
Removed the call to upload checkpoint to MOS after saving.
to correct `group_size` instead of `gropu_size`
Previously, is_last_step was passed via **kwargs and transparently
forwarded to DeepSpeedEngine.save_checkpoint(), which does not accept
this argument, causing a TypeError at checkpoint time.

Fix by explicitly declaring is_last_step=None in the signature (consistent
with megatron_strategy and fsdp2_strategy), and applying the same
async_upload guard logic as the other strategies.

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
- Fix socket resource leak in get_node_ip() by properly closing socket
- Replace list comprehension with proper loop in destroy_placement_group() for better error handling

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@UsernameFull UsernameFull force-pushed the npu_ci branch 6 times, most recently from 3822431 to 33fd25c Compare May 25, 2026 07:05
Add CPU and Ascend NPU CI workflows, vLLM/SGLang NPU compatibility fixes, and CI-stable test adaptations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.